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Abstract. The significance of having detected an astrophysical gamma ray source is usually calculated by means of a formula 
derived by Li & Ma in 1983. We solve the same problem in terms of Bayesian statistics, which provides a logically more 
satisfactory framework. We do not use any subjective elements in the present version of Bayesian statistics. We show that for 
large count numbers and a weak source the Li & Ma formula agrees with the Bayesian result. For other cases the two results 
differ, both due to the mathematically different treatment and the fact that only Bayesian inference can take into account prior 
knowldege. 
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1. Introduction 



^-j- Consider an astronomical gamma ray observation aiming to detect a source. The existence of a source in a so-called on-region is 
f^S . judged by the count number AW originating from that region. The counts in it are due to a possible source and the background. 

The latter is determined by the count number AW in some off-region. It must be chosen in such a way that one can exclude a 
q h . priori that it contains a source. Hence, we use a physically motivated choice of on- and off-regions and not a blind search. One 
also knows the expected ratio a of the count numbers if there is no source in the on-region. The number a is given by the ratio of 
the sizes of the two regions, the ratio of the exposure times for both regions and the respective acceptances: 
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! '"j ■ Given (a, AW, AW) t he question is how significantly a possible source has been detected. A positive identification obviously 

rS ' requires AW > a A^ff. lLi & Mal ( fl983l) discuss several possible estimate s of the signific ance. Estimating it as the ratio of excess 
^3 counts above background to the background's standard deviation yields ILi & M a 1983, eq. (5)) 
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However, one could as well argue that t he desired meas ure of significance should correspond to the probability that all counts 
were due to the background. That yields llLi & Mall983[ eq. (9)): 

AW - aN oS 

& LM2 - ; \j) 
Va(AV + AW ) 

Li & Ma argue that for a < 1, 5lm i underestimates the significance, Slmi overestimates it. They finally advocate the significance 
S hM (|LL&M2ll2£2l eq. (17)) in the form 

Slm = V2 AW ■ In — — - +AW-ln — — . (4) 

\ \ a (AW + AW ) / \ AW + AW // 

As a function of the random variables AW and AW this is itself a random variable. If no source is present this variable is nearly 
normally distributed even for small count numbers (according to the authors for AW, AW Z 10). For a single measurement (given 
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by the numbers a, N on and N a ff) one can interpret Slm as statistical significance. The argument of Li & Ma hinges on the fact 
that Slm has a normal distribution. They have tested this by Monte Carlo methods. 

In the present paper we define and evaluate the significance Sb of the existence of a source in terms of Bayesian statistics. 
We do so for several reasons. 

- We consider Bayesian statistics to provide a logically more satisfactory inference than the arguments of classical statistics 
used by Li & Ma. 

- Bayesian significance does not leave a choice between several definitions of significance. We do not consider the prior 
distribution to be a subjective element in statistical inference, nor do we take it to be uniform either. Rather we define it by a 
formal rule which is based on a symmetry principle. This may be called an objective Bayesian approach. 

- Bayesian statistics do not require a random variable that has an approximately normal distribution. Bayesian inference is 
therefore valid for any count number. It does not require verification by Monte Carlo methods. 

The classical significance Slm an d the Bayesian significance Sb do not have the same meaning. The first expresses a prob- 
ability that the assumption "there is no source" conflicts with observation. The corresponding test function can be defined in 
various ways. The second expresses the probability that the intensity of the source is larger than zero. This probability is taken 
from a posterior distribution of the intensity parameter, which is a well-defined result of Bayesian inference. Although the two 
quantities do not have the same meaning, we compare the numerical values because the application of Bayesian statistics is not 
common practice and there is a limiting situation in which both values agree. It occurs in the frequent case when the source is 
weak and the count numbers are high. 



2. Basics of Bayesian statistics 

2.1 . Problems depending on one parameter 

Bayesian statistics provides a way to infer physical parameters from observed data. The dependence of the observed quantities on 
the parameters is statistical. Hence, it is described in terms of probability distributions. In the following we shall use the Poisson 
distribution 

p P (n\A)=^e- A (5) 



and the binomial distribution 

p B (n\A;N) = i^r(l-A) N - n . (6) 

The parameter is a real number A, the observed datum is a whole number n. In order to derive the parameter, the conditional 
distribution p(n\A) must be proper so that 

J]?W) = 1. (7) 

n 

The Poisson and the binomial models are proper. The probability for the parameter to have the value A is found by means of 
Bayes' theorem 1 : 

j p{n\A')ii{A')dA' 

The posterior distribution P{A\n) contains the information one can deduce from the data. It is a distribution of the parameter given 
the data whereas the model p(n\A) is a distribution of the data given the parameter. 

Bayes' theorem does not determine the so-called prior distribution p(A) in equation 0. However, demanding in addition a 
symmetry for the model yields the prior distribution: In order to ensure an unbiased inference of A in the sense that the information 
obtained on A does not depend on the actually true value of A, one demands that the distribution is form-invariant. This means 
that there is a group of transformations that relates the observable n to the parameter A. The measure of the group can then 
be identified with t he prior di stribution in equation see Harney (2003), chap. 6. The measure of the group is obtained by 
"Jeffreys' rule" Csee lJeffrevsll96ll chap. 3): 

p(A) = ((d A ln p(n\A)) 2 )^ 2 . (9) 

Here, (f(n)) p denotes the expectation value of / with respect to the distribution p. For the evaluation of the right hand side of 
equation (|9j, see sect. [A] Under a transformation of the parameters, the measure transforms with the Jacobian of the transfor- 
mation, so that any derived probabilities are not affected by a reparameterization. The measure p is not necessarily a proper 

1 For improper models the prior distribution needed in Bayes' theorem is not defined. 
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distribution. One must only demand that the normalizing integral in equation © exists and thus the posterior distribution is 
proper. 

One is usually interested in an error interval for the derived value of the parameter A. It can be constructed as a Bayesian 
interval: Given a preselected probability K, it is the shortest interval [/li,/^] for which 

f P(A\n)dA = K. (10) 
1 — 

It can be shown (see lHarnevl2003[ chap. 3) that if the Bayesian interval is unique, it is defined by some constant C(K) such that 
the interval contains the points for which 

fflK>ew. (..) 

p(A) 

With (jSJl one sees that C{K) is the level of a contour line of the model p{n\A) taken as function of A. 

For the problem at hand we need the probability that the Bayesian interval excludes some lower bound A m \ n . This can be calculated 
from the posterior distribution in two steps: 

- Find the corresponding Bayesian interval. The lower bound is Amm, the upper bound A up > Amin is found by solving the 
equation: 

p(n\A up ) = p(n\A min ) . (12) 

- The probability is then 

K(A>A min ) = P{A'\n)dA' (13) 

as any K bigger than that would yield a Bayesian interval that includes A^a- 

For K close to unity it is handy to express it in a different, highly non-linear scale, which we call significance S . The conversion 
is done by 

erf{^ = K(A>A min ), (14) 
where the error function is defined by 



s_\ _ _j_ r s 

V2/ y/2n J-s 



eri'| — ] = — = | e * 2 (I . (15) 

This yields the significance in the Bayesian context. Note that the term significance is used here in a sense that can be read as 
'if the posterior distribution were Gaussian, the probability would correspond to S standard deviations'. A short-hand form of 
that is 'the significance is S sigma'. It is not required that the posterior distribution is Gaussian. However, the definition dl4> is 
motivated by the fact that for large count numbers the posterior distribution does approach a Gaussian. 
The error function in equation (1151 is odd. For sufficently large S it can be approximated by 

erf (4H~VI-f exp Hr)' c* 2 »^>°)- w 



2.2. Reducing multi-parametric problems 

The appropriate model may depend on more parameters than are interesting. That means that one has to integrate over the unin- 
teresting parameters. The question arises whether one should integrate first and apply Bayes' theorem then or if the integration 
should be performed after the application of Bayes' theorem. The second way (obtaining the full posterior distribution first and 
integrating afterwards) does not provide the measure of the interesting parameters only, although this measure is needed to find 
the Bayesian interval via equation il It . This difficulty is related to the marginalization paradox 2 llDawidll973l) . 
Thus it is reasonable to go to a minor model before applying Bayes' theorem. If the final minor model has only one parameter, 
one can apply the methods from sect. 12.11 

The minor model which one constructs by integration shall be invariant under a transformation of the integrated parameters. 
Thus one needs the conditional measure in the integration kernel. It is obtained by Jeffreys' rule if one considers the interesting 
parameters as fixed. The minor model q{n\A\) for a model p(n\A\, Ai) is thus given by 

q(n\A 1 ) = J p(n\A u A 2 )KMM)dA 2 . (17) 

2 Even if the full measure factorizes into two factors, one depending only on the interesting parameters and the other only on the uninteresting 
ones, the factors need not be meaningful measures for the minor-dimensional problem lBernardoll979l) . An example can be found in lHarneM 
120031 chap. 12.1 
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3. Solution by means of Bayesian statistics 

The expected count number A on in the on-region is due to both background counts and the possible existence of a source. With 
the expected count number Aog in the off-region and the expected count number As from the source, one has 

A on = aA oB + As, (18) 

since the expectation values linearly depend upon the intensities. 

3.1. The problem in its original parameters 

The probability of observing N on and N s given the independent parameters A on and Aqs is the product of the Poisson distributions: 

po(N on , N oS \A on , A off ) = p P (N on \A on ) ■ pp(N off \A off ) . (19) 

From this distribution one wants to infer the confidence level to which A s = can be excluded. Hence, A s must be one of the 
parameters of the model. Going to the parameters {A s , A {[) does not change any of the measures, as the transformation (eq. dl8t ) 
has the Jacobian 1. One only has to read A on as A on (A s , A ff). The parameter A s is not interesting, and one has to integrate over it 
as discussed in sect. 12.21 Thus the natural choice seems to be 



q (N on ,N oB \A s ) = j po(N oa ,N ti\A s ,A ff)p,o(Aos\-l s )dA oB . (20) 

The conditional measure po{A tf \A S ) is calculated in eq. iA.5i . Unfortunately qo is an improper model since po is not integrable 
(see sect. [B}. This problem is somewhat unexpected. It is a consequence of the fact that the measure of the Poisson model (see 
section lA~3l is improper. 



3.2. Transformation to a proper model 

However, a simple transformation circumvents the problem. We define 

A = A on + AoB , 

^on 
LO — , 

A 

N = N on + N oB . (21) 
The parameter u represents the fraction of the total intensity A in the on-region and has the boundaries 

W m in = — — < CO < 1 . (22) 

1 + a 

Since one is free to choose the units in which the intensities are measured, the problem can only depend on the relative intensities. 
This freedom of gauge becomes transparent in the new parameters. The significance can only depend on to, the total count number 
only on the uninteresting parameter A. When one introduces the new parameters u and A into equation fl!9l > one sees explicitly 
that they are independent, since the model po factorizes in the new parameters (see eq. dC.l» according to 

Po(N on , N oS \A s , A oS ) = p P (N\A) ■ p B (N m \co; N). (23) 

The total count number is given by Poisson statistics, the subdivison of the counts into on- and off -regions, given a certain oj, is 
governed by the binomial distribution. Therefore we infer a> from the binomial model only and consider the total count number 
N as fixed. In other words, we do not normalize p s (N on \cj; N) with respect to N. Then p B is proper. The measure p^(to) of ps is 
proper (see eq. JA.4» : 

/in(o>) - -t, r • (24) 

\CJ(1 -LO)j 

3.3. Explicit solution 

One can safely apply Bayes' theorem to ps to obtain 

Pi(co\N on ;N) = . (25) 

M 
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Fig. 1. Comparison of S as a function of AW for a = 0.25, AW = 2000. Significance Slm according to Li & Ma (circles) and 
Bayesian significance S B (crosses) 



The normalization N\ is 

M = I pn(N on \io;N) -^{cj) dco 

r- r U + N 0D ) r (± + AW) - AH • B^ (i + AW \ + N oS ) 

= ■ '— K - - K - - , (26) 

AW -AW! 

where B z (a, b) is the incomplete Beta function. Therewith the posterior distribution is: 

P^\N oa -N) = — — { - ■ . (27) 

r (i + N on ) T (i + AW) - AH ■ (i + AW i + AW) 

For the calculation of the significance one needs the integral over Pi: 
_ AH ■ (B w (i + N on , \ + AW) - Q + AW 5 + AW)) 

~ r(i + aw) r(i + aw) - AH ■ fi„ mm (i + aw I + aw) ' 

(28) 

The probability that a source has been detected is given by the probability that A s > 0. In the new parameters one wants to 
determine the confidence level to which one can exclude that to equals its lower bound w m j n . Hence, one must solve the equation 

Pb(w up ) = p B (w mi n) ■ (29) 

This cannot be solved analytically. However, one can prove that for AW AW > exactly one solution a> up + w m ; n exists, since 
the binomial model has then a single maximum and no minima (see sect. [D}. 3 With a> up the significance is 

S B = Vl-err^/iC^up)), (30) 

where erf 1 is the inverse of the error function. Due to the appearance of w up one cannot evaluate equation J30l > any further. 
However, we can give a Mathematica script which calculates the Bayesian significance 5b in the described way (see sect.|F|i. In 
figs.[0and|2lthe Bayesian significance is compared to the Li & Ma formula for a set of typical count numbers. 

3 If N on = 0, any Bayesian interval includes a> mm and one cannot - with any probability - affirm the existence of a source. The case yV„ff = 
entails a Bayesian interval including co = 1. Then one cannot affirm the absence of a source with any probability. 
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Fig. 2. Comparison of S as function of N on for a = 0.25, N ff = 2000. The difference AS = S - Slm is shown for S = Sb (dots) 
and the two estimates S = S lm i (equation (protec(2j, triangles) and S — S lmi (equation Q, stars). 



4. Large count numbers 

4.1. Li&Ma 

The procedure by Li & Ma is designed for the case of large count numbers. This is explicitely mentioned in the their paper 
jLi & Mall983l) and it becomes apparent if one reparametrizes equation (0} in the following two variables: 

N BG = aN oB (31) 



Nbc 



(32) 



Here, A^g is the count number expected in the on-region when no source is present and r is the ratio of excess counts to the 
expected background. A positive significance requires r > 0. Expressing Slm in the observables (A^bg, f) gives 

i / (l + r)(l+or) 1 1+a \ 1/2 

S hM = A^Vbg (l+r)ln^- — ^ + -ln- . (33) 

\ 1 + a + ar a 1 + a + ar j 

Hence, S lm grows proportional to -vWbg as one would expect for significance. The point is that no other dependencies on A^g 
are present, as the rest of equation (I33t depends on the ratio of r and a only. 



4.2. Bayes 

For the sake of comparison we must bring the Bayesian significance into the same form, such that its dependence on A^bg is the 
same as for 5lm- That means that one has to take the limit of large A^bg- 

We can approximate the posterior distribution (eq. J27l >) by a Gaussian for large count numbers. The apparent advantage is that 
this distribution can be treated analytically. The approximation is done best in the parameter in which the measure is uniform. 
Then the model and the posterior distributions are proportional to each other. Inspecting equation (I24> shows that this happens 
for the parameter 

4> - arcsin ( yfcH ) . (34) 
The approximation is calculated in appendix|E| Using tpo = arctan( y/N on /N ff) the result is 



P 2 (4>\N on ;N) = 
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If N 2 = 1, then P 2 is normalized in ] - oo, oo[. The additional normalization factor N 2 is due to the limited definition region of 
co, which means that d> is limited to 



min = arcsin ( Vw min ) < (f> < - ■ 

It is handy to define the probability K 2 as if cb was defined on the entire real axis: 
K 2 = j^" P P 2 (cf>\N on ;N)d4> Ni = l , 

The value of K 2 is 

K 2 =erf(V2A 7 (0o-^mm)) ■ 

The corresponding significance S 2 can easily be given as an analytical expression: 

S 2 = 2V^V(0 () -0mm) 

= 2 VN arcsin ■ — 

V(l +a)N 

yja + ar — yfa 



(36) 



(37) 



(38) 



= 2^N BC 



1 + a + ar 



V0~ 



• a)(l + a + ar) 

The actual factor N 2 will differ from unity. It is found by the condition 

•vr/2 



1 



J 



P 2 (<t>\Non;N)d<f> . 



(39) 



(40) 



For large count numbers the relevant range of <p is close to the position of the maximum, i.e. </>(>. A crucial property of P 2 is that 
it does not vanish at cb m \ n . The value of 0o is not far from m ; n . Therefore one can show that the upper limit of the integration in 
equation J40> can be replaced by infinity, as the corresponding correction vanishes exponentially with growing count numbers. 
Then one obtains 



N 2 = -(l+K 2 ) . 



(41) 



Note that N 2 is close to unity, and it is necessarily smaller than unity. With the additional normalization factor N 2 the integration 
over P 2 gives the Bayesian probability K% in our approximation. Using the fact that (1 - K 2 ) <sc 1 one gets 



K - -L K = 2Kl - l-(l-*2) 
B N 2 2 l+K 2 \-{\-K 2 )l2 

.(..a-fi,)^!^).. 

Going to the significance scale we have 



1-^2 1+^2 



erf 



V2 



Using equation d!6i one gets 
1 

■^exp 

Setting 5b = (1 + 6)S 2 and neglecting higher orders of 6 yields 
h^ 





1 1 


' S T 


I 2J 


» exp 

2 S 2 F 


7~< 



1 + 



d 2 ) 



(42) 



(43) 



(44) 



(45) 



The second term in this formula is due to the limited definition region of the source intensity parameter A. With equation fl39i 
one sees that its contribution becomes negligible for large Nbc as it vanishes like 1/Nbc- Then one simply has Sb = S 2 which 
is plausible, as for large count numbers the distribution will become more and more concentrated around its maximum and 
therefore in the limit the definition region of the parameter no longer has an effect. So S 2 is the Bayesian expression which can 
be compared to the Li & Ma significance as given in equation fl33t . Apparently Bayesian inference and classical statistics also 
then yield different estimates for the significance. 
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5. Large count numbers and weak source 

Typically, in gamma ray astronomy the detected sources are at the limit of the instruments' sensitivities. Therefore long observa- 
tion times are common. Thus the typical case is a weak source and large count numbers. The additional request of a weak source 
is expressed by the condition r <K 1 . In this limit the two significances actually do agree. 

5.1. Li&Ma 

Expanding the result in equation J33i up to the second order with respect to r at r — gives 



N BG / 2a+l 

S\m~r x \ 1-— ttT . (46) 



a + 1 \ 6(a + 1) 

The expansion is done up to the order in which we encounter a difference to the Bayesian significance. Equation J46I is useful 
for small values of r. The first order term is sufficient if one requires that the second order term is small compared to the leading 
order. This gives the condition of how weak the source must be in that case: 

r « \ ■ (47) 
1 + 2 a 

5.2. Bayes 

Expanding the Bayesian result for large A^bg - hence S 2 in equation d39l - up to the second order with respect to r at r — 0: 



^ BG l l-\r\ . (48) 



ce + 1 \ 4 
The first order is sufficient if r «: 1 /4. 

5.3. Comparison 

To first order in r, the formula given by Li & Ma agrees with the Bayesian result. The difference between the two significances 
is of second order in r. 



S 2 ~SiM = rJ I— -r \. (49) 



a+l \12or+l 

The numerical value of the fraction (a - I) /(a + 1) is always in [-1, 1]. Together with the factor 1/12 one finds therefore that the 
relative difference in significance is typically an order of magnitude smaller than the value of r. For a — 1 this relative difference 
is of order r 2 . This shows that in the case of large count numbers and a weak source the Bayesian result and the formula given 
by Li & Ma are very close to each other. 

Interestingly the correction due to the limited definition region (second term in equation ( 145 » is often numerically more 
important than the intrinsic difference between the two results as given by formula j49l >. For the case of r = 0.1, a — 0.3 and a 
typical significance of 3cr the difference according to equation i49i is only of order 0.4%, whereas the limited definition region 
changes the significance by 6.9%. The correction by the restricted definition region is more important than the intrinsic difference 
given by the mathematically different treatment as long as 



S 2 r< 12 1n2 



a+l 



1 



(50) 



This case is relevant since the actual limit of large count numbers is hard to reach and it quickly leads to significances which are so 
high that one could not doubt the existence of a source. If condition J50I is fulfilled the difference between the two significances 
is dominated - technically speaking - by the definition region. The interesting point is that an unrestricted definition region would 
allow a source with negative intensity. Here, physics tells us that a source can only increase the count number since the source 
does not interfere with the background. In other words: An intensity always has a value > 0. One sees how Bayesian statistics 
allows us to take into account a-priori knowledge via the definition region. In classical statistics a-priori knowledge is not taken 
into account. Implicitely the intensity parameter of the source is completey free in ] - 00, oo[. 

6. Conclusions 



The decision about a signal in the presence of background has been considered by Li & Ma in the framework of classical statistics. 
We have presented the Bayesian treatment of the same problem. This yields a complete solution which is not restricted to large 
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Fig. 3. Difference AS = S - Slm for a = 0.3, r — 1/3 as a function of Slm- The significance S = Sb (dots) is moderately higher 
than the one given by Li & Ma (squares). The curve shows S = 5b as calculated from the approximation given in equation J45i . 
Note that the Bayesian procedure can only be evaluated for integer count numbers, not allowing for a continuous coverage on the 
^LM-axis. The off count number N b varies from to 2000 for Slm from to 7.5 

count numbers. The Bayesian significance is correct for any A^ on , N s. 

We compared the significance by Li & Ma with the Bayesian one in the limit of large count numbers. This was dictated by the fact 
that Li & Ma have formulated their expression for that limit. It turns out that classical statistics and Bayesian inference generally 
yield different results. They agree, however, in the limit of large count numbers and a weak source. 

There are interesting cases where the limit of large count numbers is not fully reached. Then an accurate representation of the 
Bayesian significance requires a correction of order N~ x ^ 2 as compared to the leading term which is of order N 1 ^ 2 . There is no 
room for it in the argument of Li & Ma. The correction is due to the fact that a physical intensity parameter cannot have negative 
values. Bayesian inference takes care of this piece of prior knowledge. 



Appendix A: Calculation of measures 

The evaluation of equation l|9} is easy using the expectation values for the respective distribution. For the Poisson distribution 
one has 



(n> P = A, 
(n\ = A 2 + A, 

For the binomial distribution the expectation values are 

<«>b = NA, 
(n\ = N A ((N- 1)A+1). 

The measure of the Poisson distribution is therewith: 
lnpp(n\A) - nlnA-A, 



(A.l) 



(A.2) 



d A lnp P (n\A) = - - 1 , 

A 



((d A \np P (n\A)) 2 ) p 
MA) 



1-2 + 

A- 1 ' 2 . 



1 +A _ 1 
A ~ A 



(A3) 



The measure of the binomial distribution is 

In p B (n\A;N) = In I J +n]nA+ (N-n)ln(l -A), 
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n N — n 



d A \rip B {n\A; N) 
((d A ln p B (n\A;N)) 2 ) 



A I- A 
N 



pb A(l-A) 



N 



1/2 



^ A) = \W^T) ■ (A ' 4) 



The conditional measure /io(/WWs) needed in equation i20\ is calculated in the same way, using the expectation values for the 
Poisson distribution: 

In p (N on , N off \A s , A otf ) = N on \n(A s + aA oS ) - A s - aA otf + 
AW In A oS - A of[ - ln(A^ off ! AT on !) , 

d Aoir In po(N on ,N ff\A s , ^oif) = —7— - a + - 1 , 

^on 'toff 

((3 ioff In Po (N oa , N oS \A„ A oB )) 2 ) = ^- + J- , 

' p° A oa A of f 



2 1 \l/2 

a 1 x 



Ho(*osM = r + 1-1 • ( A - 5 ) 

\ a/toff + a s A tf , 



Appendix B: Check if the minor model is proper 

It has to be checked whether the model qo in equation i20\ is proper. Thus one has to evaluate 



po(N on , N oS \As, A off ) p Q (A oB \A s ) dA oB 



^()(^oifl^s)^ofF 

Jo 

r°° I a 2 1 \ 1/2 
H r + "H C 8 - 1 ) 

Jo \«/loff + /ls A>ff/ 

This integral diverges and hence qo{N oa , N oB \A S ) is an improper model. 

Appendix C: Transformation to a proper model 

The transformation from the original parameters (A s , A oB ) to the new ones (a>, A) is calculated in a few lines: 



AW ' iVoff! 



AW AW! 



■ e 



A ^v e -A 1 <J<™{\-cSf 

N on \N oS \ 



N\ \N, 



^ p P (N\A)-p B (N m \cj;N). (C.l) 



Appendix D: Uniqueness of the solution 

The first derivative of the model p% from equation is 

(N 0D N oS 



p' B (co\N m ;N) = (— - T - 2SL ) p B NiVon;A0- (D.l) 
V w 1 - w/ 
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It vanishes at 



wo = — • (D.2) 

The value of the second derivative at coo is 

N 3 

p£(G>o|AU AO = _ T7 — 7T~ < ■ (D.3) 

JVoff ^Von 

Hence, p% has a single maximum and no minima. Therefore for each co\ + too one has exactly one other a>2 for which equa- 
tion J29I holds. Thus one has a unique solution co up + w m ; n in equation (I29> . 

Appendix E: Approximation to the posterior distribution 

The result of the transformation of /?b(w) to the parameter (f> = arcsin( -\faj) is 

Pit® ~ (sin0) 2Ar ™ (cos0) 2AU . (E.l) 

The approximation is achieved by expanding the logarithm of the distribution around its maximum and taking the exponential of 
the result. Using y = N on /N s = a(l + r) the maximum is at 

0o = arctan(-y/y) . (E.2) 
The expansion up to second order is 

ln/v(0) = d(N on ,N) - 2N (<p - <f>o) 2 + W 3 ] . (E.3) 
Hence, one has 

P 2 = -Lexp(-2N(ct>-cPo) 2 ) (E.4) 
With the normalization constant 

C2 = ^ (R5) 
the distribution P 2 is normalized in ] - oo, oo[. 



Appendix F: Mathematica script to evaluate Bayesian significance 

Although we cannot give a close formula for the Bayesian significance, we can show a short Mathematica script which calculates 
the significance as given in equation (I30> . 

data = { a -> 0.25, 
non -> 16, 
noff -> 10 }; 

n = non + noff; 
b = non/noff; 
wrain = a/(l + a) ; 

pBin[x_, n_, non_] := Binomial[n, non]x"non(l - x)"(n - non); 
pRaw[x_, n_, non_] := pBin[x, n, non] (Sqrt[n/x(l - x)]); 
norm = Integrate [pRaw[x, n, non], {x, wrain, 1}]; 
p[x_] := pRaw[x, n, non] /norm; 

rule = FindRoot [Evaluate [(1 - w) (1 + a) == (wmin/w)"b /.data], 

{w, wmin/a, non/n, l}/.data]; 
i[w0_, wl_] := Integrate [p [w] , {w, w0, wl}, GenerateConditions -> False]; 
temp = Evaluate [(i [wmin, w /. rule]) /. data]; 



Print["Sigma (Bayes) : "] ; 

sigma = InverseErf [temp] Sqrt[2] 
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